Visual Acoustic vs. Aural Perceptual Speaker Identifica- tion in a Closed Set of Disguised Voices

نویسنده

Jonas Lindh

چکیده

Many studies of automatic speaker recognition have investigated which parameters that perform best. This paper presents an experiment where graphic representations of LTAS (Long Time Average Spectrum) were used to identify speakers from a closed set of disguised voices and determine how well the graphic method performed compared to an aural approach. Nine different speakers were recorded uttering a fake threat. The speakers used different disguises such as dialect, accent, whisper, falsetto etc. and the verbatim “threat” in a normal voice. Using high quality recordings, visual comparison of the Praat “vocal tract” graphs of LTAS outperformed the aural approach in identifying the disguised voices. Performing speaker identification aurally does not mean analyzing a different sample than the one being analyzed acoustically. Studies of aural perception show a hypothesizing, top-down, active process, which create interesting questions regarding aural speaker identification with bad quality recordings in noisy backgrounds etc. However, more tests on telephone quality recordings, authentic material and additional types of acoustic measurements, must be performed to be able to say anything about LTAS with implications for forensic purposes. Background and Introduction The so-called “voiceprint” approach introduced by Lawrence Kersta (1962) suggested a pattern matching procedure comparing broadband spectrograms for speaker identification purposes. It is within this context that an interest in studying visual vs. aural methods arose. Since complex visual pattern matching activates the right hemisphere of the brain and speechand language processes usually the left (Rose, 2002) it would be preferable to find a way to integrate both. There are many problems to be considered when using visual representations of acoustic data within the context of forensic speaker identification, especially considering the effects of low quality recordings. Generally, one can say that primarily aural identification has been the leading method when it comes to casework. Many studies have been carried out to see what parameters are most stable or where effects of low quality can be calculated, for example the telephone effect (Künzel, 2001). Generally, LTAS becomes rather stable after 30-40 seconds of speech. (Boves, 1984; Fritzell et. al., 1974; Keller, 2004) LTAS reflects the energy highs and lows generated by the vocal tract filter on average, which means that it should be more difficult to alter than, for example, F0 or specific phones, why this measure is often chosen to visually represent the general energy distributions in frequency for the speech signal. Several studies have been conducted to study energy ratios and level differences for LTAS (Löfqvist, 1986; Löfqvist & Mandersson, 1987; Gauffin & Sundberg, 1977; Kitzing, 1986). Kitzing (1986) recommended that patients should read at the same degree of vocal loudness to avoid the differences that occurred especially in higher frequencies. Kitzing & Åkerlund (1993) pointed out the need for an investigation of the effect of vocal loudness on LTAS curves. Nordenberg & Sundberg (2003) performed such a test and showed that vocal loudness and varied f0 gave variations in Long Time Average Spectra. However, even though an expected variation has been shown, the ability to perform pattern matching on the graphs seems to be possible. It has been observed that a slight difference between the identification results between subjects depends on whether they consider distance more important than shape/pattern. Hollien & Majewski (1977) tested long-term spectra as a means of speaker identification under three different speaking conditions, i.e. normal, during stress and disguised speech. LTS for fifty American and fifty polish male speakers were used under fullband as well as passband conditions. The results demonstrated high levels of correct identification (especially under fullband conditions) for normal speech with degrading results for stress and disguise.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Recognition of Disguised Voices: a Program for Research

A program for carrying out research on the speaker recognition of disguised voices is proposed. Such a program would consist of the following: 1) Definition and classification of disguises 2) Creation of databases of disguised voices 3) Testing of conventional speaker recognition systems on the disguised-voice database 4) The investigation of which basic methods of modeling the vocal tract -HMM...

متن کامل

Evaluation of speaker mimic technology for personalizing SGD voices

In this paper, we demonstrate the use of state-of-the-art speech technology to transform speech from a source speaker to mimic a particular target speaker with the intention of providng personalized voices to users of Speech Generating Devices (SGDs). This speaker mimicry (SM) capability allows us to use highquality acoustic inventories from professional speakers and transform them to a differe...

متن کامل

Automatic speaker recognition as a measurement of voice imitation and conversion

Voices can be deliberately disguised by means of human imitation or voice conversion. The question arises to what extent they can be modified by using either method. In the current paper, a set of speaker identification experiments are conducted; first, analysing some prosodic features extracted from voices of professional impersonators attempting to mimic a target voice and, second, using both...

متن کامل

Acoustical and perceptual study of voice disguise by age modification in speaker verification

The task of speaker recognition is feasible when the speakers are co-operative or wish to be recognized. While modern automatic speaker verification (ASV) systems and some listeners are good at recognizing speakers from modal, unmodified speech, the task becomes notoriously difficult in situations of deliberate voice disguise when the speaker aims at masking his or her identity. We approach voi...

متن کامل

Immediate effects of vocal warm-up exercises on elementary teachers' voice

Introduction: Teachers are a large group of professional voice users who are exposed to many voice problems. Vocal warm-up exercises (VWUE) can prepare the muscles involved in vocalization before teaching and can reduce voice damage in teachers. However, limited studies have examined the effects of VWUE on teachers' voices. Therefore, the present study was conducted to investigate the immediate...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Visual Acoustic vs. Aural Perceptual Speaker Identifica- tion in a Closed Set of Disguised Voices

نویسنده

چکیده

منابع مشابه

Speaker Recognition of Disguised Voices: a Program for Research

Evaluation of speaker mimic technology for personalizing SGD voices

Automatic speaker recognition as a measurement of voice imitation and conversion

Acoustical and perceptual study of voice disguise by age modification in speaker verification

Immediate effects of vocal warm-up exercises on elementary teachers' voice

عنوان ژورنال:

اشتراک گذاری